AITopics | heavy ball momentum

Neural networks are trained by minimizing loss functions with gradient-based optimizers. Cohen et al. [2021] observed that full-batch gradient descent operates at the edge of stability (EoS): the largest eigenvalue of the Hessian, called the sharpness, first rises (a phase called progressive sharpening) and then hovers at the stability threshold 2/η where η is the learning rate. Cohen et al. [2022] extended this picture to momentum methods and adaptive gradient methods, showing that each optimizer exhibits its own edge of stability. Rather than hovering at 2/η, the relevant quantity--the preconditioned sharpness--hovers at a hyperparameter-dependent threshold that depends on the optimizer (Table 2). In practice, the dominant optimizer in machine learning is Adam [Kingma and Ba, 2015], which differs from gradient descent in two respects.

artificial intelligence, equation, machine learning, (15 more...)

arXiv.org Machine Learning

2605.06821

Country: North America (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

019f8b946a256d9357eadc5ace2c8678-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 10:10:26 GMT

artificial intelligence, machine learning, nullnull, (16 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

019f8b946a256d9357eadc5ace2c8678-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 10:10:22 GMT

artificial intelligence, machine learning, nullnull, (15 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

b166b57d195370cd41f80dd29ed523d9-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 18:28:27 GMT

convergence, pd error, step size, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Industry:

Education (0.46)
Leisure & Entertainment (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Heavy Ball Momentum for Conditional Gradient

Neural Information Processing SystemsDec-24-2025, 18:33:14 GMT

Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications. Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter \textit{per iteration} PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov's momentum, can further tighten this PD error bound. Numerical results demonstrate the usefulness of heavy ball momentum in FW iterations.

conditional gradient, heavy ball momentum, name change, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

b166b57d195370cd41f80dd29ed523d9-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 22:20:17 GMT

artificial intelligence, machine learning, step size, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
(2 more...)

Industry:

Education (0.46)
Leisure & Entertainment (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Heavy Ball Momentum for Conditional Gradient

Neural Information Processing SystemsAug-16-2025, 22:20:13 GMT

Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW .

artificial intelligence, machine learning, step size, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Industry: Leisure & Entertainment > Sports > Tennis (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Continuous-Time Analysis of Heavy Ball Momentum in Min-Max Games

Feng, Yi, Fujii, Kaito, Skoulakis, Stratis, Wang, Xiao, Cevher, Volkan

arXiv.org Artificial IntelligenceMay-27-2025

Since Polyak's pioneering work, heavy ball (HB) momentum has been widely studied in minimization. However, its role in min-max games remains largely unexplored. As a key component of practical min-max algorithms like Adam, this gap limits their effectiveness. In this paper, we present a continuous-time analysis for HB with simultaneous and alternating update schemes in min-max games. Locally, we prove smaller momentum enhances algorithmic stability by enabling local convergence across a wider range of step sizes, with alternating updates generally converging faster. Globally, we study the implicit regularization of HB, and find smaller momentum guides algorithms trajectories towards shallower slope regions of the loss landscapes, with alternating updates amplifying this effect. Surprisingly, all these phenomena differ from those observed in minimization, where larger momentum yields similar effects. Our results reveal fundamental differences between HB in min-max games and minimization, and numerical experiments further validate our theoretical results.

artificial intelligence, continuous-time analysis, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2505.19537

Country:

Europe (0.67)
North America > United States (0.45)
Asia > Japan (0.27)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games (0.67)
Leisure & Entertainment > Sports > Tennis (0.64)

Technology:

Information Technology > Game Theory (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Heavy Ball Momentum for Conditional Gradient

Neural Information Processing SystemsJan-18-2025, 18:22:37 GMT

Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications. Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter \textit{per iteration} PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov's momentum, can further tighten this PD error bound.

conditional gradient, heavy ball momentum, iteration, (1 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback

Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems

Neural Information Processing SystemsMar-3-2024, 06:01:36 GMT

The optimization step in many machine learning problems rarely relies on vanilla gradient descent but it is common practice to use momentum-based accelerated methods. Despite these algorithms being widely applied to arbitrary loss functions, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit. By numerical integration of these equations, we observe that these methods speed up the dynamics but do not improve the algorithmic threshold with respect to gradient descent in the spiked model.

algorithm, equation, gradient descent, (13 more...)

Neural Information Processing Systems

Country: